6 research outputs found
A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual Learning
Forgetting refers to the loss or deterioration of previously acquired
information or knowledge. While the existing surveys on forgetting have
primarily focused on continual learning, forgetting is a prevalent phenomenon
observed in various other research domains within deep learning. Forgetting
manifests in research fields such as generative models due to generator shifts,
and federated learning due to heterogeneous data distributions across clients.
Addressing forgetting encompasses several challenges, including balancing the
retention of old task knowledge with fast learning of new tasks, managing task
interference with conflicting goals, and preventing privacy leakage, etc.
Moreover, most existing surveys on continual learning implicitly assume that
forgetting is always harmful. In contrast, our survey argues that forgetting is
a double-edged sword and can be beneficial and desirable in certain cases, such
as privacy-preserving scenarios. By exploring forgetting in a broader context,
we aim to present a more nuanced understanding of this phenomenon and highlight
its potential advantages. Through this comprehensive survey, we aspire to
uncover potential solutions by drawing upon ideas and approaches from various
fields that have dealt with forgetting. By examining forgetting beyond its
conventional boundaries, in future work, we hope to encourage the development
of novel strategies for mitigating, harnessing, or even embracing forgetting in
real applications. A comprehensive list of papers about forgetting in various
research fields is available at
\url{https://github.com/EnnengYang/Awesome-Forgetting-in-Deep-Learning}
Continual Learning From a Stream of APIs
Continual learning (CL) aims to learn new tasks without forgetting previous
tasks. However, existing CL methods require a large amount of raw data, which
is often unavailable due to copyright considerations and privacy risks.
Instead, stakeholders usually release pre-trained machine learning models as a
service (MLaaS), which users can access via APIs. This paper considers two
practical-yet-novel CL settings: data-efficient CL (DECL-APIs) and data-free CL
(DFCL-APIs), which achieve CL from a stream of APIs with partial or no raw
data. Performing CL under these two new settings faces several challenges:
unavailable full raw data, unknown model parameters, heterogeneous models of
arbitrary architecture and scale, and catastrophic forgetting of previous APIs.
To overcome these issues, we propose a novel data-free cooperative continual
distillation learning framework that distills knowledge from a stream of APIs
into a CL model by generating pseudo data, just by querying APIs. Specifically,
our framework includes two cooperative generators and one CL model, forming
their training as an adversarial game. We first use the CL model and the
current API as fixed discriminators to train generators via a derivative-free
method. Generators adversarially generate hard and diverse synthetic data to
maximize the response gap between the CL model and the API. Next, we train the
CL model by minimizing the gap between the responses of the CL model and the
black-box API on synthetic data, to transfer the API's knowledge to the CL
model. Furthermore, we propose a new regularization term based on network
similarity to prevent catastrophic forgetting of previous APIs.Our method
performs comparably to classic CL with full raw data on the MNIST and SVHN in
the DFCL-APIs setting. In the DECL-APIs setting, our method achieves 0.97x,
0.75x and 0.69x performance of classic CL on CIFAR10, CIFAR100, and
MiniImageNet
ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation
Multimodal recommendation aims to model user and item representations
comprehensively with the involvement of multimedia content for effective
recommendations. Existing research has shown that it is beneficial for
recommendation performance to combine (user- and item-) ID embeddings with
multimodal salient features, indicating the value of IDs. However, there is a
lack of a thorough analysis of the ID embeddings in terms of feature semantics
in the literature. In this paper, we revisit the value of ID embeddings for
multimodal recommendation and conduct a thorough study regarding its semantics,
which we recognize as subtle features of content and structures. Then, we
propose a novel recommendation model by incorporating ID embeddings to enhance
the semantic features of both content and structures. Specifically, we put
forward a hierarchical attention mechanism to incorporate ID embeddings in
modality fusing, coupled with contrastive learning, to enhance content
representations. Meanwhile, we propose a lightweight graph convolutional
network for each modality to amalgamate neighborhood and ID embeddings for
improving structural representations. Finally, the content and structure
representations are combined to form the ultimate item embedding for
recommendation. Extensive experiments on three real-world datasets (Baby,
Sports, and Clothing) demonstrate the superiority of our method over
state-of-the-art multimodal recommendation methods and the effectiveness of
fine-grained ID embeddings
AdaTask: A Task-aware Adaptive Learning Rate Approach to Multi-task Learning
Multi-task learning (MTL) models have demonstrated impressive results in
computer vision, natural language processing, and recommender systems. Even
though many approaches have been proposed, how well these approaches balance
different tasks on each parameter still remains unclear. In this paper, we
propose to measure the task dominance degree of a parameter by the total
updates of each task on this parameter. Specifically, we compute the total
updates by the exponentially decaying Average of the squared Updates (AU) on a
parameter from the corresponding task.Based on this novel metric, we observe
that many parameters in existing MTL methods, especially those in the higher
shared layers, are still dominated by one or several tasks. The dominance of AU
is mainly due to the dominance of accumulative gradients from one or several
tasks. Motivated by this, we propose a Task-wise Adaptive learning rate
approach, AdaTask in short, to separate the \emph{accumulative gradients} and
hence the learning rate of each task for each parameter in adaptive learning
rate approaches (e.g., AdaGrad, RMSProp, and Adam). Comprehensive experiments
on computer vision and recommender system MTL datasets demonstrate that AdaTask
significantly improves the performance of dominated tasks, resulting SOTA
average task-wise performance. Analysis on both synthetic and real-world
datasets shows AdaTask balance parameters in every shared layer well.Comment: AAAI 202
Uniform Sequence Better: Time Interval Aware Data Augmentation for Sequential Recommendation
Sequential recommendation is an important task to predict the next-item to access based on a sequence of interacted items. Most existing works learn user preference as the transition pattern from the previous item to the next one, ignoring the time interval between these two items. However, we observe that the time interval in a sequence may vary significantly different, and thus result in the ineffectiveness of user modeling due to the issue of preference drift. In fact, we conducted an empirical study to validate this observation, and found that a sequence with uniformly distributed time interval (denoted as uniform sequence) is more beneficial for performance improvement than that with greatly varying time interval. Therefore, we propose to augment sequence data from the perspective of time interval, which is not studied in the literature. Specifically, we design five operators (Ti-Crop, Ti-Reorder, Ti-Mask, Ti-Substitute, Ti-Insert) to transform the original non-uniform sequence to uniform sequence with the consideration of variance of time intervals. Then, we devise a control strategy to execute data augmentation on item sequences in different lengths. Finally, we implement these improvements on a state-of-the-art model CoSeRec and validate our approach on four real datasets. The experimental results show that our approach reaches significantly better performance than the other 9 competing methods. Our implementation is available: https://github.com/KingGugu/TiCoSeRec